 |
 |
XML for the absolute beginner
A guided tour from HTML to processing XML with Java
Summary In just a few short years, the
World Wide Web and HTML have taken the world by storm. But HTML's
limitations and the ever-increasing demand for more flexibility in
Internet systems has XML, the Extensible Markup Language, brewing on the
horizon. Further, Java applications that move data around need a data
representation format as portable as Java itself. Developers who learn
XML now will find it a powerful tool for data representation, storage,
modelling, and interoperation.
Mark Johnson steps away from his popular JavaBeans
column this month to introduce you to the world of XML: where it came
from, why it's necessary, how it interoperates with existing Internet
technology, and how to use it in your designs. You'll learn about
Cascading Style Sheets and XSL, then follow up with a look at the XML
and Java technology base at a promising Internet startup, with comments
from that company's CEO and technical lead. By the time you've finished
reading Mark's article, you'll understand why so many people are paying
so much attention to this new data representation standard. (11,000
words) By Mark
Johnson

Printer-friendly
version | Mail this to a friend
Page 1 of 10
TML and the World Wide Web are everywhere. As an
example of their ubiquity, I'm going to Central America for Easter this
year, and if I want to, I'll be able to surf the Web, read my e-mail, and
even do online banking from Internet cafés in Antigua Guatemala and Belize
City. (I don't intend to, however, since doing so would take time away
from a date I have with a palm tree and a rum-filled coconut.)
And yet, despite the omnipresence and popularity of HTML, it is
severely limited in what it can do. It's fine for disseminating informal
documents, but HTML now is being used to do things it was never designed
for. Trying to design heavy-duty, flexible, interoperable data systems
from HTML is like trying to build an aircraft carrier with hacksaws and
soldering irons: the tools (HTML and HTTP) just aren't up to the job.
The good news is that many of the limitations of HTML have been
overcome in XML, the Extensible Markup Language. XML is easily
comprehensible to anyone who understands HTML, but it is much more
powerful. More than just a markup language, XML is a metalanguage
-- a language used to define new markup languages. With XML, you can
create a language crafted specifically for your application or domain.
XML will complement, rather than replace, HTML. Whereas HTML is used
for formatting and displaying data, XML represents the contextual meaning
of the data.
This article will present the history of markup languages and how XML
came to be. We'll look at sample data in HTML and move gradually into XML,
demonstrating why it provides a superior way to represent data. We'll
explore the reasons you might need to invent a custom markup language, and
I'll teach you how to do it. We'll cover the basics of XML notation, and
how to display XML with two different sorts of style languages. Then,
we'll dive into the Document Object Model, a powerful tool for
manipulating documents as objects (or manipulating object structures as
documents, depending upon how you look at it). We'll go over how to write
Java programs that extract information from XML documents, with a pointer
to a free program useful for experimenting with these new concepts.
Finally, we'll take a look at an Internet company that's basing its core
technology strategy on XML and Java.
Is XML for you? Though this article is written for
anyone interested in XML, it has a special relationship to the
JavaWorld series on XML JavaBeans. (See Resources
for links to related articles.) If you've been reading that series and
aren't quite "getting it," this article should clarify how to use XML with
beans. If you are getting it, this article serves as the perfect
companion piece to the XML JavaBeans series, since it covers topics
untouched therein. And, if you're one of the lucky few who still have the
XML JavaBeans articles to look forward to, I recommend that you read the
present article first as introductory material.
A note about Java There's so much recent XML
activity in the computer world that even an article of this length can
only skim the surface. Still, the whole point of this article is to give
you the context you need to use XML in your Java program designs. This
article also covers how XML operates with existing Web technology, since
many Java programmers work in such an environment.
XML opens the Internet and Java programming to portable, nonbrowser
functionality. XML frees Internet content from the browser in much the
same way Java frees program behavior from the platform. XML makes Internet
content available to real applications.
Java is an excellent platform for using XML, and XML is an outstanding
data representation for Java applications. I'll point out some of Java's
strengths with XML as we go along.
Let's begin with a history lesson.
The origins of markup languages
The HTML we all know and love (well, that we know,
anyway) was originally designed by Tim Berners-Lee at CERN (le Conseil
Européen pour la Recherche Nucléaire, or the European Laboratory for
Particle Physics) in Geneva to allow physics nerds (and even non-nerds) to
communicate with each other. HTML was released in December 1990 within
CERN, and became publicly available in the summer of 1991 for the rest of
us. CERN and Berners-Lee gave away the specifications for HTML, HTTP, and
URLs, in the fine old tradition of Internet share-and-enjoy.
Berners-Lee defined HTML in SGML, the Standard Generalized Markup
Language. SGML, like XML, is a metalanguage -- a language used for
defining other languages. Each so-defined language is called an
application of SGML. HTML is an application of SGML.
SGML emerged from research done primarily at IBM on text document
representation in the late '60s. IBM created GML ("General Markup
Language"), a predecessor language to SGML, and in 1978 the American
National Standards Institute (ANSI) created its first version of SGML. The
first standard was released in 1983, with the draft standard released in
1985, and the first standard was published in 1986. Interestingly enough,
the first SGML standard was published using an SGML system developed by
Anders Berglund at CERN, the organization that, as we have seen, gave us
HTML and the Web.
SGML is widely used in large industries and governments such as in
large aerospace, automotive, and telecommunications companies. SGML is
used as a document standard at the United States Department of Defense and
the Internal Revenue Service. (For readers outside of the US, the IRS are
the tax guys.)
Albert Einstein said everything should be made as simple as possible,
and no simpler. The reason SGML isn't found in more places is that it's
extremely sophisticated and complex. And HTML, which you can find
everywhere, is very simple; for a lot of applications, it's too simple.
Next
page > Page 1 XML for the absolute beginner Page 2 HTML:
All form and no substance Page 3 An
XML conceptual example Page 4 Make
up a markup Page 5 So,
what good is made-up markup? Page 6 Cascading
Style Sheets: not just for HTML anymore Page 7 XSL:
I like your style Page 8 Modeling
information structure in XML Page 9 XML
and Java Page 10 Become
a tree surgeon!
Printer-friendly
version | Mail this to a friend
Resources There are so
many XML resources on the Web, I've had to categorize. The first section
here is the most useful, since the documents are either high-level
summaries or excellent link sites. Apologies to anyone who was omitted.
XML and Java: General XML resources
- "XML, Java and the Future of the Web," Jon Bosak. The paper that
started it all, at least from a Java programmer's point of view.
Definitely worth a read, even if it's a bit dated. Jon is commonly
considered to be the father of XML. Funny how all of these technologies
seem to have paternity:
http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- "Media-Independent Publishing: Four Myths about XML" Jon Bosak:
http://metalab.unc.edu/pub/sun-info/standards/xml/why/4myths.htm
- Robin Cover's XML-SGML site is, according to my SGML buddies, the
bible of XML resources:
http://www.oasis-open.org/cover/
- The W3C's XML resource page lets you cheer from the sidelines as XML
technology proposals develop into recommendations, or join in the fray
on their active mailing lists:
http://www.w3.org/XML/
- OASIS, the Web site of the Organization for the Advancement of
Structured Information Standards, offers general news and information
about XML:
http://www.oasis-open.org/
- The Graphics Communications Association, host of the XTech '99
conference (March 11 to 13, 1999, San Jose, CA) and the upcoming XML
Europe '99 conference in Granada, Spain, (April 26 to 30, 1999) has a
Web site packed with XML information:
http://www.gca.org/
- XML.com is great for watching trends and digging up XML news:
http://www.xml.com/
- Textuality hosts Tim Bray's site. Check it out for a look at the
"big picture" of how XML fits into the structured document universe --
and for a look at Lark, Tim's nonvalidating XML processor:
http://www.textuality.com/
- The XML FAQ:
http://www.ucc.ie/xml/
- IBM's XML Website is an outstanding supplement to alphaWorks:
http://www.software.ibm.com/xml/index.html
XML and Java
- "XML and Java: The Perfect Pair" by Ken Sall (Internet.com, November
1998) provides information about XML, Java, and why these two are a
match made in heaven:
http://wdvl.com/Authoring/Languages/XML/Java/index.html
Tutorials and training
- Generally Markup, Richard Lander's Web site may be of interest to
you if you haven't yet read enough about markup languages:
http://pdbeam.uwaterloo.ca/~rlander/
- The Mulberry Technologies Web site is a good resource for commercial
training in XML, as well as general XML and SGML consulting by seasoned
SGML experts:
http://www.mulberrytech.com/
- The Web Developer's Virtual Library Series on XML offers good
summaries of various XML technologies, as well as annotated indices of
XML software:
http://wdvl.com/Software/XML
- Microsoft's Site Builder Network provides a series of articles
called "Extreme XML," one of which appears in the following link. While
some of it focuses on Microsoft-only, Windows-only technology, there's
still some great stuff here:
http://www.microsoft.com/sitebuilder/magazine/xml.asp
- Webmonkey has a good series of articles introducing readers to XML.
The index is at:
http://www.hotwired.com/webmonkey/xml/?tw=xml
- "What the ?xml!" by L.C. Rees offers an interesting take on XML and
why it's necessary -- nicely written and entertaining to boot:
http://www.geocities.com/SiliconValley/Peaks/5957/wxml.html
- "The XML Revolution" by Dan Connolly is a quick backgrounder on XML
(Nature):
http://helix.nature.com/webmatters/xml.html
Cascading Style Sheets
- W3C's CSS page will get your started learning about CSS:
http://www.w3.org/Style/CSS/
- "Cascading Style Sheets Designing for the Web" by Hakom Wium Lie and
Bert Bos (Addison-Wesley, 1997) Sample chapters from the book appear at:
http://www.awl.com/cseng/titles/0-201-41998-X/liebos/
Extensible Style Language (XSL)
- The W3C's XSL page:
http://www.w3.org/Style/XSL/
- Read (and comment on) the W3C's XSL Working Draft (currently dated
December 16, 1998):
http://www.w3.org/TR/WD-xsl
- "The Extensible Style Language: Styling XML Documents"
(WebTechniques Magazine) XSL tutorial information and examples:
http://www.webtechniques.com/features/1999/01/walsh/walsh.shtml
- Microsoft's XML and XSL tutorial site is especially interesting
because of the recent release of client-side XSL in Internet Explorer
5.0. Extensive and excellent:
http://www.microsoft.com/xml
- If you're still using IE 4.0, you can still experiment with XML,
using Microsoft's internal DOM:
http://www.microsoft.com/xml/articles/xmlmodel.asp
- If you want to experiment with XSL, try downloading IBM's LotusXSL.
It's all Java, and for the time being, it's free:
http://www.alphaworks.ibm.com/tech/LotusXSL
- Or, you can try James Clark's XT XSL engine, downloadable from:
http://www.jclark.com/xml/xt.html
Upcoming XSL contest
Though the details aren't yet worked out, Sun Microsystems will soon
announce a call for proposals for a $30,000 grant to develop a
client-side processor for full XSL implementation in Mozilla.
It will also announce, in conjunction with Adobe, a contest (first prize
$40,000, second prize $20,000) to develop a pure-Java, server-side
processor of the entire XSL language, to format XML to PDF (Adobe's
document format). Keep watching the Java Developer Connection (requires
free registration), and Mozilla sites for the eventual announcements.
- "XTech '99: Java and the XML wave" by Mark Johnson
(JavaWorld, April 1999) offers the most current information on
the contest:
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xtech.html
Simple API for XML (SAX)
- The definitive description of SAX is available online. You can also
download free SAX software here:
http://www.megginson.com/SAX/index.html
Document Object Model (DOM)
- The W3C information page for the Document Object Model appears on
the W3C site:
http://www.w3c.org/DOM/
- Among other things, you'll find the W3C Recommendation for DOM Level
1:
http://www.w3.org/TR/REC-DOM-Level-1/
- The Java bindings for DOM, for both XML and HTML, are in this
Recommendation appendix:
http://www.w3.org/TR/REC-DOM-Level-1/java-language-binding.html
- A great DOM tutorial by William Robert Stanek appears on PC
Magazine Online in "Object-Based Web Design." This tutorial
includes a discussion of using DOM with IDL, CORBA's Interface
Definition Language:
http://www8.zdnet.com/pcmag/pctech/content/17/13/tf1713.001.html
Dynamic HTML
- The Dynamic HTML Resource page contains several links to DHTML
articles:
http://www.hotwired.com/webmonkey/dynamic_html/?tw=dynamic_html
Software
- Epicentric, Inc.:
http://www.epicentric.com/
- More XML (and other Java) technology than you can shake a stick at
is available at IBM's alphaWorks:
http://alphaworks.ibm.com/
- Version 2 of IBM's excellent XML parser package, xml4j, is available
for download. This package includes several parsers, both validating and
nonvalidating:
http://www.alphaworks.ibm.com/tech/xml4j
- See also IBM's exciting Bean Markup Language project, which uses XML
to represent and manipulate JavaBeans:
http://www.alphaworks.ibm.com/tech/bml
- Another free Java XML parser was written by the indefatiguable James
Clark, download at:
http://www.jclark.com/xml/xp/index.html
- XEENA is IBM alphaWorks's DTD-guided XML editor. You want it, you
need it, you gotta have it:
http://www.alphaworks.ibm.com/tech/xeena
- Mozilla.org is the open source community's effort to extend the
Netscape source code. Find out about it at:
http://www.mozilla.org/
- Information about XML and CSS in Mozilla appears at:
http://www.mozilla.org/rdf/doc/xml.html
- You can read about Sun's XML and Java initiatives at:
http://www.sun.com/990310/java_xml.jhtml
- In addition, Java Project X includes source code downloadable from:
http://developer.java.sun.com/developer/earlyAccess/xml/index.html
- ArborText has a suite of sophisticated tools for editing SGML, XML,
and XSL:
http://www.arbortext.com/Products/products.html
- Oracle8i from Oracle corporation uses XML inside the Oracle core:
http://www.oracle.com/xml/
- Download Oracle's free XML for Java parser:
http://technet.oracle.com/direct/3xml.htm
- Microsoft's Internet Explorer 5.0, released this month, implements
part of the XSL spec. You can find it on Microsoft's Web site -- and
also just about anywhere else:
http://www.microsoft.com/windows/ie/default.htm
- You can also download a beta release of Microsoft's XML Notepad
editor (limited to running only on Microsoft Windows):
http://www.microsoft.com/xml/notepad/download.asp
- Vervet Logic of Bloomington, IN, has announced XML <PRO>, a
commercial XML editor:
http://www.vervet.com/
- Majix, to transform XML to HTML via XSL, is available at:
http://www.tetrasix.com/
- If your French is rusty, you might want to try the English-language
site at:
http://www.tetrasix.com/english/default.htm
History
- Read about the history of HTML here. It's part of an online book, so
there's no telling for how long it will be available:
http://ei.cs.vt.edu/~wwwbtb/hardcopy/book/chap4/origins.html The
two chapters listed below (of the book "HTML Unleashed" by Rick Darnell,
et al., also cover some of the technical background of these languages.
- SGML history
http://www.webreference.com/dlab/books/html/3-2.html
- XML history (such as it is):
http://www.webreference.com/dlab/books/html/38-0.html
- Nothing to do on Friday night? Why not read up on the history of
SGML? Charles Goldfarb, considered by many to be the "father of SGML,"
reminisces publicly at:
http://www.sgmlsource.com/Goldfarb/history/index.htm
- Useful XML and SGML information appears at Goldfarb's Web site,
including a comprehensive XML book list:
http://www.sgmlsource.com/
Miscellaneous links
- Uche Ogbuji has written an interesting article in
LinuxWorld about using XML on Linux in the Enterprise. It's at:
http://www.linuxworld.com/linuxworld/lw-1999-03/lw-03-xml.html
- Bluestone Software has recently made a splash with pure-Java XML
application servers, and a freely downloadable Swing package called
XwingML:
http://www.bluestone.com/
- Everyone (except Microsoft) is pretty freaked out about the US
Patent Office awarding Microsoft a patent for certain kinds of
functionality in style sheets. What happens with this patent, and its
impact on developing technology, remains to be seen. Judge for yourself
by reading the patent at:
http://www.patents.ibm.com/patlist?icnt=US&patent_number=5860073
- The title of the sample recipe is actually the title of a very funny
song by William Bolcom. Similar recipes may be found at:
http://www.b4uby.com/granny/gsoup.htm
- The song appears on a compact disc (with other odd songs) available
from the Public Radio Music Source at:
http://75music.org/best/docs/keepers.htm
|
 |